What is claimed is: 

1 . A method for balancing the load of a parallel processing system having a plurality of 
parallel processing elements arranged in a loop, wherein each processing element (PE^) has a 
local number of tasks associated therewith, wherein r represents the number for a selected 
processing element, and wherein each of said processing elements is operable to communicate 
with a clockwise adjacent processing element and with an anti-clockwise adjacent processing 
element, the method comprising: 

determining within each processing element a total number of tasks present 
within said loop; 

calculating a local mean number of tasks within each of said plurality of 
processing elements; 

calculating a local deviation within each of said plurality of processing elements; 

determining a sum deviation within each of said processing elements for one-half 
said loop in an anti-clockwise direction, said one-half of said loop being relative to 
each of said selected processing elements; 

determining a sum deviation within each of said processing elements for one-half 
said loop in a clockwise direction, said one-half of said loop being relative to each of 
said selected processing elements; 

determining a clockwise transfer parameter and an anti-clockwise transfer 
parameter within each of said processing elements; and 

redistributing tasks among said plurality of processing elements in response to 
said clockwise transfer parameters and said anti-clockwise parameters within each of 
said plurality of processing elements. 

2. The method of claim 1 wherein said determining within each of said processing 
elements a total number of tasks present within said loop, comprises: 

transmitting said local number of tasks associated with each of said processing 
elements to each other of said plurality of processing elements within said loop; 

receiving within each of said processing elements said number of local tasks 
associated with said each other of said plurality of processing elements; and 

summing said number of local tasks associated with each of said processing 
elements with said number of local tasks associated with each other of said plurality 
of processing elements. 
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3. The method of claim 1 wherein said determining said total number of tasks present 

i=iV-l 

within said loop includes solving the equation V = ^ j where iV^ represents the number of 

1=0 

processing elements in said loop, and v,- represents said local number of tasks associated with 
an i'* processing element in said loop. 

4. The method of claim 1 wherein said calculating a local mean number of tasks within 
each of said plurality of processing elements includes solving the equation 

= Trunc((y -^E^)/ N), where Mr is said local mean for PEr, N is the total number of 
processing elements in said loop, and Er is a number in the range of 0 to (AT— 1) and wherein 
each processing element has a different Er value. 

5. The method of claim 3 wherein Er controls said Trunc function such that said total 
number of tasks for said loop is equal to the sum of the local mean number of tasks for each 

of said plurality of processing elements in said loop (i.e., V = ^ M • ). 

1=0 

6. The method of claim 3 wherein said local mean = Trunc((V + E^)I N) for each 
local PE;. within said loop is equal to one of ^and 

7. The method of claim 1 wherein said calculating a local deviation within each of said 
plurality of processing elements, comprises finding the difference between said local number 
of tasks and said local mean number for each of said plurality of processing elements. 

8. The method of claim 1 wherein said determining a sum deviation within each of said 
processing elements for one-half of said loop in an einti-clockwise direction comprises: 

transmitting said local deviation associated with each of said processing elements 
half way around said loop in an anti-clockwise direction, said one-half of said loop 
being relative to each of said selected processing elements; 

receiving said local deviation associated with each other of said processing 
elements half way around said loop in a clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; and 

summing said local deviations associated with each other of said processing 
elements half way around said loop in a clockwise direction. 
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9. The method of claim 1 wherein said determining a sum deviation within each of said 
processing elements in one-half of said loop in a clockwise direction comprises: 

transmitting said local deviation associated with each of said processing elements 
half way around said loop in an clockwise direction, said one-half of said loop being 
relative to each of said selected processing elements; 

receiving said local deviation associated with each other of said processing 
elements half way around said loop in a anti -clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; and 

summing said local deviations associated with each other of said processing 
elements halfway around said loop in an anti -clockwise direction. 

10. The method of claim 1 wherein said determining a clockwise transfer parameter and 
an anti-clockwise transfer parameter within each of said processing elements comprises: 

setting the clockwise transfer parameter equal to (25 + A-C)^A\ and 
setting the anti-clockwise transfer parameter equal to {IS + C — A) ^ A, where S 
represents the deviation of a selected processing element; C represents said sum 
deviation in a clockwise half of loop relative to said selected processing element, and 
A represents said sum deviation in an anti-clockwise half of loop relative to said 
selected processing element. 

11. The method of claim 1 wherein said determining a clockwise transfer parameter and 
an anti-clockwise transfer parameter within each of said processing elements comprises at 
least one of: 

setting the clockwise transfer parameter equal to Trunc{{2S + A) ^ 4] and setting 
the anti-clockwise transfer parameter equal to S-Tc and 

setting the anti-clockwise transfer parameter equal to Trunc[{lS - A) 4] and 
setting the clockwise transfer parameter equal to 5 - Ta\ 
where A = Mag, if A > Mag, where A = -Mag, if A < -Mag, where Mag = abs(25), and where 
S represents the local deviation of a selected processing element. 

12. A method for reassigning tasks among an odd numbered plurality of processing 
elements within a parallel processing system, said processing elements being coimected in a 
loop and each having a local number of tasks associated therewith, the method comprising: 

determining the total number of tasks on said loop; 

computing a local mean value for a selected processing element; 
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computing a local deviation for said selected processing element, said local 
deviation representative of the difference between said local number of tasks for said 
selected processing element and said local mean value for said selected processing 
element; 

inserting a phantom processing element within said loop; 

summing said deviation of said processing elements located within one-half of 
the loop in an anti-clockwise direction relative to said selected processing element; 

summing said deviation of said processing elements located within one-half of 
the loop in a clockwise direction relative to said selected processing element; 

computing a number of tasks to transfer in a clockwise direction for said selected 
processing element; 

computing a number of tasks to transfer in an anti-clockwise direction for said 
selected processing element; and 

reassigning tasks relative to the said number of tasks to transfer in a clockwise 
direction and said number of task to transfer in an anti-clockwise direction. 

13. The method of claim 12 wherein said determining the total number of tasks on said 
loop, comprises: 

transmitting said local number of tasks associated with each of said processing 
elements to each other of said plurality of processing elements within said loop; 

receiving within each of said processing elements said number of local tasks 
associated with said each other of said plurality of processing elements; and 

summing said number of local tasks associated with each of said processing 
elements with said number of local tasks associated with each other of said plurality 
of processing elements. 

14. The method of claim 12 wherein computing a local mean value for a selected 
processing element includes solving the equation = Trunc{{V E^)/ N) , where Mr is 
said local mean for a PE^-, N is the total number of processing elements in said loop, and Er is 
a number in the range of 0 to (A/^—l). 

15. The method of claim 14 wherein Er controls said Trunc function such that said total 
number of tasks for said loop is equal to the sum of the local mean number of tasks for each 
of said plurality of processing elements in said loop and wherein each processing element has 
a different Er value assigned. 
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16. The method of claim 12 wherein said inserting a phantom processing element within 
said loop further comprises: 

locating said phantom processing element in a position within said loop that is 
diametrically opposed to said processing element; and 

assigning a zero deviation value to said phantom processing element. 

17. The method of claim 12 wherein said computing a local mean value for a selected 
processing element, said computing a local deviation for said selected processing element, 
said inserting a phantom processing element within said loop, said summing said deviation of 
said processing elements located within one-half of the loop in an anti-clockwise direction, 
summing said deviation of said processing elements located within one-half of the loop in a 
clockwise direction, computing a number of tasks to transfer in a clockwise direction for said 
selected processing element, computing a number of tasks to transfer in an anti-clockwise 
direction for said selected processing element, and reassigning tasks relative to the said 
number of task to transfer in a clockwise direction and said number of tasks to transfer in an 
anti-clockwise direction are completed simultaneously for each of said plurality of processing 
elements within said loop. 

18. The method of claim 12 wherein said summing said deviation of said processing 
elements located within one-half of the loop in an anti-clockwise direction relative to said 
selected processing element comprises: 

transmitting said local deviation associated with each of said processing elements 
halfway around said loop in an anti-clockwise direction, said one-half of said loop 
being relative to each of said selected processing elements; 

receiving said local deviation associated with each other of said processing 
elements halfway around said loop in a clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; and 

summing said local deviations associated with each other of said processing 
elements half way around said loop in a clockwise direction. 

19. The method of claim 12 wherein summing said deviation of said processing elements 
located within one-half of the loop in a clockwise direction relative to said selected processing 
element comprises: 

transmitting said local deviation associated with each of said processing elements 
half way around said loop in an clockwise direction, said one-half of said loop being 
relative to each of said selected processing elements; 
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receiving said local deviation associated with each other of said processing 
elements halfway around said loop in a anti-clockwise direction, said one-half of said 
loop being relative to each of said selected processing elements; and 

summing said local deviations associated with each other of said processing 
elements half way around said loop in an anti-clockwise direction. 

20. A memory device carrying a set of instructions which, when executed, perform a 
method comprising: 

determining within each processing element a total number of tasks present 
within said loop; 

calculating a local mean number of tasks within each of said plurality of 
processing elements; 

calculating a local deviation within each of said plurality of processing elements; 

determining a sum deviation within each of said processing elements for one-half 
said loop in an anti-clockwise direction, said one-half of said loop being relative to 
each of said selected processing elements; 

determining a sum deviation within each of said processing elements for one -half 
said loop in a clockwise direction, said one-half of said loop being relative to each of 
said selected processing elements; 

determining a clockwise transfer parameter and an anti-clockwise transfer 
parameter within each of said processing elements; and 

redistributing tasks among said plurality of processing elements in response to 
said clockwise transfer parameters and said anti-clockwise parameters within each of 
said plurality of processing elements. 
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